[SPARK-19831][CORE] Reuse the existing cleanupThreadExecutor to clean up the directories of finished applications to avoid the block by hustfxj · Pull Request #17189 · apache/spark

hustfxj · 2017-03-07T12:02:10Z

Cleaning the application may cost much time at worker, then it will block that the worker send heartbeats master because the worker is extend ThreadSafeRpcEndpoint. If the heartbeat from a worker is blocked by the message ApplicationFinished, master will think the worker is dead. If the worker has a driver, the driver will be scheduled by master again.
It had better reuse the existing cleanupThreadExecutor to clean up the directories of finished applications to avoid the block.

…plication to avoid the block

jerryshao · 2017-03-08T12:55:23Z

I think you need to call cleanupApplicationThreadExecutor.shutdownNow() when worker is stopped.

zsxwing · 2017-03-08T19:16:39Z

core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala

-        logInfo(s"Cleaning up local directories for application $id")
-        dirList.foreach { dir =>
-          Utils.deleteRecursively(new File(dir))
+      cleanupApplicationThreadExecutor.submit(new Runnable {


I prefer to just reuse the existing cleanupThreadExecutor like this:

appDirectories.remove(id).foreach { dirList => concurrent.Future { logInfo(s"Cleaning up local directories for application $id") dirList.foreach { dir => Utils.deleteRecursively(new File(dir)) } }(cleanupThreadExecutor).onFailure { case e: Throwable => logError(s"Clean up app dir $dirList failed: ${e.getMessage}", e) }(cleanupThreadExecutor) } shuffleService.applicationRemoved(id)

I agree with you, but I am worried that cleaning up the workDir and application all cost mush time.

That won't become an issue. The worst case is there will be some pending tasks in the queue of cleanupThreadExecutor. Considering the number of applications is not huge, it won't be an issue.

zsxwing · 2017-03-08T19:16:57Z

ok to test

SparkQA · 2017-03-08T21:57:33Z

Test build #74220 has finished for PR 17189 at commit a353262.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-03-09T14:41:51Z

Test build #74266 has finished for PR 17189 at commit cbf14b2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

zsxwing · 2017-03-09T18:31:27Z

core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala


-  // A separated thread to clean up the workDir. Used to provide the implicit parameter of `Future`
-  // methods.
+  // A separated thread to clean up the workDir and the finished application.


nit: the directories of finished applications.

zsxwing · 2017-03-09T18:32:43Z

@hustfxj Looks pretty good. Could you update the PR title and description to reflect the latest changes?

hustfxj · 2017-03-12T08:02:39Z

@zsxwing Of course, Thank you for your reminding

SparkQA · 2017-03-12T10:39:31Z

Test build #74397 has finished for PR 17189 at commit 8e44aab.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

zsxwing · 2017-03-12T17:28:17Z

LGTM. Merging to master. Thanks!

[SPARK-19831][CORE] Use a separate thread to clean up the finished ap…

a353262

…plication to avoid the block

zsxwing requested changes Mar 8, 2017

View reviewed changes

A separated thread to clean up the workDir and the finished application

cbf14b2

zsxwing reviewed Mar 9, 2017

View reviewed changes

hustfxj changed the title ~~[SPARK-19831][CORE] Use a separate thread to clean up the finished application to avoid the block~~ [SPARK-19831][CORE] Use a separate thread to clean up the directories of finished applications to avoid the block Mar 12, 2017

hustfxj changed the title ~~[SPARK-19831][CORE] Use a separate thread to clean up the directories of finished applications to avoid the block~~ [SPARK-19831][CORE] Reuse the existing cleanupThreadExecutor to clean up the directories of finished applications to avoid the block Mar 12, 2017

typo

8e44aab

asfgit closed this in 2f5187b Mar 12, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-19831][CORE] Reuse the existing cleanupThreadExecutor to clean up the directories of finished applications to avoid the block#17189

[SPARK-19831][CORE] Reuse the existing cleanupThreadExecutor to clean up the directories of finished applications to avoid the block#17189
hustfxj wants to merge 3 commits intoapache:masterfrom
hustfxj:worker-hearbeat

hustfxj commented Mar 7, 2017 •

edited

Loading

Uh oh!

jerryshao commented Mar 8, 2017

Uh oh!

zsxwing Mar 8, 2017 •

edited

Loading

Uh oh!

hustfxj Mar 9, 2017

Uh oh!

zsxwing Mar 9, 2017

Uh oh!

zsxwing commented Mar 8, 2017

Uh oh!

SparkQA commented Mar 8, 2017

Uh oh!

SparkQA commented Mar 9, 2017

Uh oh!

zsxwing Mar 9, 2017

Uh oh!

zsxwing commented Mar 9, 2017

Uh oh!

hustfxj commented Mar 12, 2017

Uh oh!

SparkQA commented Mar 12, 2017

Uh oh!

zsxwing commented Mar 12, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

hustfxj commented Mar 7, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jerryshao commented Mar 8, 2017

Uh oh!

zsxwing Mar 8, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hustfxj Mar 9, 2017

Choose a reason for hiding this comment

Uh oh!

zsxwing Mar 9, 2017

Choose a reason for hiding this comment

Uh oh!

zsxwing commented Mar 8, 2017

Uh oh!

SparkQA commented Mar 8, 2017

Uh oh!

SparkQA commented Mar 9, 2017

Uh oh!

zsxwing Mar 9, 2017

Choose a reason for hiding this comment

Uh oh!

zsxwing commented Mar 9, 2017

Uh oh!

hustfxj commented Mar 12, 2017

Uh oh!

SparkQA commented Mar 12, 2017

Uh oh!

zsxwing commented Mar 12, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hustfxj commented Mar 7, 2017 •

edited

Loading

zsxwing Mar 8, 2017 •

edited

Loading